[X86] Re-enable DA8W4 path on X86 CPU#4033
Conversation
…nsorConfig and Int4OpaqueTensor
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4033
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit f58c16f with merge base 3d02561 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Pull request overview
This PR re-enables the DA8W4 (dynamic int8 activation + int4 weight) CPU path on x86 by extending Int4OpaqueTensor to support DA8W4 packing/execution and adding a new quantization config and unit tests for the workflow.
Changes:
- Add DA8W4 weight quantization + prepack (
from_hp_da8w4) and a DA8W4aten.lineardispatch path inInt4OpaqueTensor. - Introduce
Int8DynamicActInt4WeightOpaqueTensorConfigand its module transform to apply DA8W4 quantization usingInt4OpaqueTensor. - Restore/expand DA8W4 CPU unit tests (
test/quantization/test_da8w4_cpu.py) and export the new config in the package__init__.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| torchao/prototype/int4_opaque_tensor/int4_opaque_tensor.py | Adds DA8W4 weight quantize+prepack and dynamic-activation linear implementation using da8w4_linear_cpu. |
| torchao/prototype/int4_opaque_tensor/inference_workflow.py | Adds a new DA8W4 config + quantize-module handler for Int4OpaqueTensor. |
| torchao/prototype/int4_opaque_tensor/init.py | Exposes the new DA8W4 config in the public prototype package API. |
| test/quantization/test_da8w4_cpu.py | Adds DA8W4 CPU tests validating compilation/codegen and basic accuracy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # DA8W4 path: dynamic int8 activation + int4 weight | ||
| if weight_tensor.act_mapping_type is not None: | ||
| if weight_tensor.act_mapping_type == MappingType.SYMMETRIC: |
There was a problem hiding this comment.
act_mapping_type is stored on Int4OpaqueTensor as a string ("symmetric"/"asymmetric"), but this dispatch compares it to MappingType.SYMMETRIC. That condition will never be true, so the symmetric-version gate here is ineffective and the code is inconsistent with _da8w4_linear (which checks the string). Make the representation consistent (e.g., store MappingType in the tensor attribute and compare against MappingType.*, or keep it as a string and compare against "symmetric").
| if weight_tensor.act_mapping_type == MappingType.SYMMETRIC: | |
| if weight_tensor.act_mapping_type == "symmetric": |
| if config.set_inductor_config: | ||
| torchao.quantization.utils.recommended_inductor_config_setter() | ||
|
|
||
| assert hasattr(module, "weight"), ( | ||
| "applying DA8W4 quant requires module to have weight attribute" | ||
| + f" but {module} does not have one" | ||
| ) |
There was a problem hiding this comment.
The DA8W4 module transform quantizes/prepacks weights unconditionally. If the DA8W4 CPU kernels aren’t built/registered (or if running on an older PyTorch that doesn’t support the needed path), this will still replace module.weight with an Int4OpaqueTensor and the first linear() call will fail at runtime. Consider adding an early guard here (similar to the unit test) that checks kernel availability via torch._C._dispatch_dump("torchao::da8w4_linear_cpu") and a torch_version_at_least("2.7.0") (and 2.8.0 for symmetric) before applying the transform; otherwise log and return the original module.
Summary
This PR re-enables DA8W4 path on X86 CPU with Int8DynamicActInt4WeightOpaqueTensorConfig and Int4OpaqueTensor and updates UT in test/quantization/test_da8w4_cpu.py
Test plan
python test/quantization/test_da8w4_cpu.py